368 research outputs found

    The molecular dimension of microbial species: 1. Ecological distinctions among, and homogeneity within, putative ecotypes of Synechococcus inhabiting the cyanobacterial mat of Mushroom Spring, Yellowstone National Park

    Full text link
    © 2015 Becraft, Wood, Rusch, Kühl, Jensen, Bryant, Roberts, Cohan and Ward. Based on the Stable Ecotype Model, evolution leads to the divergence of ecologically distinct populations (e.g., with different niches and/or behaviors) of ecologically interchangeable membership. In this study, pyrosequencing was used to provide deep sequence coverage of Synechococcus psaA genes and transcripts over a large number of habitat types in the Mushroom Spring microbial mat. Putative ecological species (putative ecotypes), which were predicted by an evolutionary simulation based on the Stable Ecotype Model (Ecotype Simulation), exhibited distinct distributions relative to temperature-defined positions in the effluent channel and vertical position in the upper 1 mm-thick mat layer. Importantly, in most cases variants predicted to belong to the same putative ecotype formed unique clusters relative to temperature and depth in the mat in canonical correspondence analysis, supporting the hypothesis that while the putative ecotypes are ecologically distinct, the members of each ecotype are ecologically homogeneous. Putative ecotypes responded differently to experimental perturbations of temperature and light, but the genetic variation within each putative ecotype was maintained as the relative abundances of putative ecotypes changed, further indicating that each population responded as a set of ecologically interchangeable individuals. Compared to putative ecotypes that predominate deeper within the mat photic zone, the timing of transcript abundances for selected genes differed for putative ecotypes that predominate in microenvironments closer to upper surface of the mat with spatiotemporal differences in light and O2 concentration. All of these findings are consistent with the hypotheses that Synechococcus species in hot spring mats are sets of ecologically interchangeable individuals that are differently adapted, that these adaptations control their distributions, and that the resulting distributions constrain the activities of the species in space and time

    A Catalog of Reference Genomes from the Human Microbiome

    Get PDF
    The human microbiome refers to the community of microorganisms including prokaryotes, viruses and microbial eukaryotes that populate the human body. The National Institutes of Health launched an initiative that focuses describing the diversity of microbial species associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains “novel” polypeptides that had both unmasked sequence length > 100 amino acids and no BLASTP match to any non-reference entry in the nr subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (~97%) were unique. In addition, this set of microbial genomes allows for ~ 40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic datasets. In addition, the associated metrics and standards used by the group for quality assurance are presented

    The importance of metagenomic surveys to microbial ecology: or why Darwin would have been a metagenomic scientist

    Get PDF
    Scientific discovery is incremental. The Merriam-Webster definition of 'Scientific Method' is "principles and procedures for the systematic pursuit of knowledge involving the recognition and formulation of a problem, the collection of data through observation and experiment, and the formulation and testing of hypotheses". Scientists are taught to be excellent observers, as observations create questions, which in turn generate hypotheses. After centuries of science we tend to assume that we have enough observations to drive science, and enable the small steps and giant leaps which lead to theories and subsequent testable hypotheses. One excellent example of this is Charles Darwin's Voyage of the Beagle, which was essentially an opportunistic survey of biodiversity. Today, obtaining funding for even small-scale surveys of life on Earth is difficult; but few argue the importance of the theory that was generated by Darwin from his observations made during this epic journey. However, these observations, even combined with the parallel work of Alfred Russell Wallace at around the same time have still not generated an indisputable 'law of biology'. The fact that evolution remains a 'theory', at least to the general public, suggests that surveys for new data need to be taken to a new level

    Robust estimation of microbial diversity in theory and in practice

    Get PDF
    Quantifying diversity is of central importance for the study of structure, function and evolution of microbial communities. The estimation of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably estimate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in comparing species richness estimates by applying Chao's estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics ("Hill diversities"), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao's estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.Comment: To be published in The ISME Journal. Main text: 16 pages, 5 figures. Supplement: 16 pages, 4 figure

    The significance of nitrogen cost minimization in proteomes of marine microorganisms

    Get PDF
    Marine microorganisms thrive under low levels of nitrogen (N). N cost minimization is a major selective pressure imprinted on open-ocean microorganism genomes. Here we show that amino-acid sequences from the open ocean are reduced in N, but increased in average mass compared with coastal-ocean microorganisms. Nutrient limitation exerts significant pressure on organisms supporting the trade-off between N cost minimization and increased average mass of amino acids that is a function of increased A+T codon usage. N cost minimization, especially of highly expressed proteins, reduces the total cellular N budget by 2.7–10% this minimization in combination with reduction in genome size and cell size is an evolutionary adaptation to nutrient limitation. The biogeochemical and evolutionary precedent for these findings suggests that N limitation is a stronger selective force in the ocean than biosynthetic costs and is an important evolutionary strategy in resource-limited ecosystems

    Computational Biology in Costa Rica: The Role of a Small Country in the Global Context of Bioinformatics

    Get PDF
    Introduction: The successful development of high throughput methods for DNA sequencing, transcriptomics, proteomics, and other –omics, has contributed to the emergence of novel possibilities for the examination of complex biological systems through computational analysis. These fields have witnessed unprecedented advances in high income countries. Nevertheless, the role of other nations needs to be examined in order to delineate their contribution within the global context of bioinformatics. Previous articles have focused on the expansion of Computational Biology in Brazil and Mexico [1],[2], two of the largest Latin American countries, and which have shown political commitment to foster their scientific development. Costa Rica is a small Central American country with a population of 4 million, with its territory 164 and 38 times smaller than Brazil and Mexico, respectively. Thus, it is interesting to visualize the possibilities and challenges of this low-income country in the context of the global bioinformatics endeavor.UCR::Vicerrectoría de Investigación::Unidades de Investigación::Ciencias de la Salud::Instituto Clodomiro Picado (ICP

    Defining seasonal marine microbial community dynamics

    Get PDF
    Here we describe, the longest microbial time-series analyzed to date using high-resolution 16S rRNA tag pyrosequencing of samples taken monthly over 6 years at a temperate marine coastal site off Plymouth, UK. Data treatment effected the estimation of community richness over a 6-year period, whereby 8794 operational taxonomic units (OTUs) were identified using single-linkage preclustering and 21 130 OTUs were identified by denoising the data. The Alphaproteobacteria were the most abundant Class, and the most frequently recorded OTUs were members of the Rickettsiales (SAR 11) and Rhodobacteriales. This near-surface ocean bacterial community showed strong repeatable seasonal patterns, which were defined by winter peaks in diversity across all years. Environmental variables explained far more variation in seasonally predictable bacteria than did data on protists or metazoan biomass. Change in day length alone explains >65% of the variance in community diversity. The results suggested that seasonal changes in environmental variables are more important than trophic interactions. Interestingly, microbial association network analysis showed that correlations in abundance were stronger within bacterial taxa rather than between bacteria and eukaryotes, or between bacteria and environmental variables

    Short clones or long clones? A simulation study on the use of paired reads in metagenomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metagenomics is the study of environmental samples using sequencing. Rapid advances in sequencing technology are fueling a vast increase in the number and scope of metagenomics projects. Most metagenome sequencing projects so far have been based on Sanger or Roche-454 sequencing, as only these technologies provide long enough reads, while Illumina sequencing has not been considered suitable for metagenomic studies due to a short read length of only 35 bp. However, now that reads of length 75 bp can be sequenced in pairs, Illumina sequencing has become a viable option for metagenome studies.</p> <p>Results</p> <p>This paper addresses the problem of taxonomical analysis of paired reads. We describe a new feature of our metagenome analysis software MEGAN that allows one to process sequencing reads in pairs and makes assignments of such reads based on the combined bit scores of their matches to reference sequences. Using this new software in a simulation study, we investigate the use of Illumina paired-sequencing in taxonomical analysis and compare the performance of single reads, short clones and long clones. In addition, we also compare against simulated Roche-454 sequencing runs.</p> <p>Conclusion</p> <p>This work shows that paired reads perform better than single reads, as expected, but also, perhaps slightly less obviously, that long clones allow more specific assignments than short ones. A new version of the program MEGAN that explicitly takes paired reads into account is available from our website.</p

    Analysis and comparison of very large metagenomes with fast clustering and functional annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand.</p> <p>Results</p> <p>The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (<b>RAMMCAP</b>) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes".</p> <p>Conclusion</p> <p>RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from <url>http://tools.camera.calit2.net/camera/rammcap/</url>.</p

    Streaming histogram sketching for rapid microbiome analytics

    Get PDF
    Background: The growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time. To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time. Results: We apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme. Furthermore, we use a ‘real life’ example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s. Conclusions: Our method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosketching Using Little K-mers (HULK), which can histosketch a typical 2 GB microbiome in 50 s on a standard laptop using four cores, with the sketch occupying 3000 bytes of disk space
    corecore